Using Directed Graph Based BDMM Algorithm for Chinese Word Segmentation

نویسندگان

  • Yaodong Chen
  • Ting Wang
  • Huowang Chen
چکیده

Word segmentation is a key problem for Chinese text analysis. In this paper, with the consideration of both word-coverage rate and sentencecoverage rate, based on the classic Bi-Directed Maximum Match (BDMM) segmentation method, a character Directed Graph with ambiguity mark is designed for searching multiple possible segmentation sequences. This method is compared with the classic Maximum Match algorithm and Omni-segmentation algorithm. The experiment result shows that Directed Graph based BDMM algorithm can achieve higher coverage rate and lower complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A MMSM-based Hybrid Method for Chinese MicroBlog Word Segmentation

After years of researches, Chinese word segmentation has achieved quite high precisions for formal style text. However, the performance of segmentation is not so satisfying for MicroBlog corpora. In this paper we describe a scheme for Chinese word segmentation for, MicroBlog which integrates the characterbased and word-based information in the directed graph generated by MMSM model. Word-level ...

متن کامل

HMM and CRF Based Hybrid Model for Chinese Lexical Analysis

This paper presents the Chinese lexical analysis systems developed by Natural Language Processing Laboratory at Dalian University of Technology, which were evaluated in the 4th International Chinese Language Processing Bakeoff. The HMM and CRF hybrid model, which combines character-based model with word-based model in a directed graph, is adopted in system developing. Both the closed and open t...

متن کامل

Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints

This study investigates on building a better Chinese word segmentation model for statistical machine translation. It aims at leveraging word boundary information, automatically learned by bilingual character-based alignments, to induce a preferable segmentation model. We propose dealing with the induced word boundaries as soft constraints to bias the continuous learning of a supervised CRFs mod...

متن کامل

Effective Subsequence-based Tagging for Chinese Word Segmentation

Effective Subsequence-based Tagging for Chinese Word Segmentation Hai Zhao, Chunyu Kit (1. Department of Chinese, Translation and Linguistics, City University of Hong Kong, 83 Tat Avenue, Kowloon, Hong Kong SAR, China) Abstract: The research of automatic Chinese word segmentation has been advancing rapidly in recent years, especially since the First International Chinese Word Segmentation Bakeo...

متن کامل

DAG-based Long Short-Term Memory for Neural Word Segmentation

Neural word segmentation has attracted more and more research interests for its ability to alleviate the effort of feature engineering and utilize the external resource by the pre-trained character or word embeddings. In this paper, we propose a new neural model to incorporate the wordlevel information for Chinese word segmentation. Unlike the previous wordbased models, our model still adopts t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005